Structurally constrained and pathology-aware convolutional transformer generative adversarial network for virtual histology staining of human coronary optical coherence tomography images

Abstract. Significance There is a significant need for the generation of virtual histological information from coronary optical coherence tomography (OCT) images to better guide the treatment of coronary artery disease (CAD). However, existing methods either require a large pixel-wise paired training dataset or have limited capability to map pathological regions. Aim The aim of this work is to generate virtual histological information from coronary OCT images, without a pixel-wise paired training dataset while capable of providing pathological patterns. Approach We design a structurally constrained, pathology-aware, transformer generative adversarial network, namely structurally constrained pathology-aware convolutional transformer generative adversarial network (SCPAT-GAN), to generate virtual stained H&E histology from OCT images. We quantitatively evaluate the quality of virtual stained histology images by measuring the Fréchet inception distance (FID) and perceptual hash value (PHV). Moreover, we invite experienced pathologists to evaluate the virtual stained images. Furthermore, we visually inspect the virtual stained image generated by SCPAT-GAN. Also, we perform an ablation study to validate the design of the proposed SCPAT-GAN. Finally, we demonstrate 3D virtual stained histology images. Results Compared to previous research, the proposed SCPAT-GAN achieves better FID and PHV scores. The visual inspection suggests that the virtual histology images generated by SCPAT-GAN resemble both normal and pathological features without artifacts. As confirmed by the pathologists, the virtual stained images have good quality compared to real histology images. The ablation study confirms the effectiveness of the combination of proposed pathological awareness and structural constraining modules. Conclusions The proposed SCPAT-GAN is the first to demonstrate the feasibility of generating both normal and pathological patterns without pixel-wisely supervised training. We expect the SCPAT-GAN to assist in the clinical evaluation of treating the CAD by providing 2D and 3D histopathological visualizations.


Introduction
Coronary artery disease (CAD) is the narrowing of coronary arteries caused by a build-up of atherosclerotic plaques.As the most common type of heart disease, CAD leads to one in seven deaths in the United States. 1 Optical coherence tomography (OCT) has been recognized as a valuable tool for imaging coronary tissue structures due to its high-resolution capabilities. 2owever, real-time interpretation of OCT images requires a significant amount of expertise and prior training.Additionally, the power of OCT interpretation, especially of the pathological region, is hindered by the lack of histopathological correlation.At present, direct histopathological analysis requires an invasive and time-consuming evaluation that involves post-mortem tissue examination.The use of multiple reagents in histopathology can also lead to detrimental effects on tissue imaging.Histopathological analysis is not suitable for clinical use in patients, who require real-time tissue characterization of coronary arteries.
Incorporating histopathological visualization into real-time OCT imaging holds great potential to complement OCT with histopathological visualization.A typical example of generating virtual stained histology images from OCT images of human coronary arteries is shown in Fig. 1.To date, there are limited frameworks developed to generate virtual stained histology from OCT images. 3,4Winetraub et al. used Pix2Pix Generative Adversarial Networks (GANs) to generate virtual stained hematoxylin and eosin (H&E) histology for human skin tissues. 3However, Pix2Pix GAN for virtual staining requires a pixel-wisely paired OCT and H&E image dataset.The creation of a pixel-wisely paired dataset demands a significant investment of resources and labor, including the embedding of samples in fluorescent gel, photo-bleaching, and manual fine alignment. 3Such a method also lacks generalizability to blood vessels, which are deformable soft tissue.Our previous method 4 demonstrates the capability to segment the three-layer structure (i.e., intima, media, and adventitia) in both OCT and H&E images, thereby generating virtual stained images optimized for different layers in human coronary.However, current performance has not been optimal if there are pathological patterns, such as calcium and lipid accumulation, that alter the typical three-layer structure of human coronary arteries.
To generate pathological-related regions from an unpaired dataset, we propose a structurally constrained pathology-aware convolutional transformer GAN (SCPAT-GAN) to generate virtually stained H&E histology images from OCT images.The proposed SCPAT-GAN incorporates two key components to enhance image quality for both normal and pathological coronary samples: a structural constraining module and a pathology awareness module.In summary, our main contributions include the following.The convolutional transformer generators (G O→H and G H→O ) take advantage of U-Net 5 like structure to extract multi-scale features.The multi-scale features are sent to Swin transformer block (STB) and structural constraint and pathology aware (SCPA) module.The STB is a deep neural network architecture that employs multiple residual Swin transformer sub-blocks (RSTBs) to extract features from input data.The RSTBs contain various Swin transformer layers (STLs) 6 that facilitate local attention and cross-window interaction learning.The feature extraction process of RSTBs is expressed as: by partitioning the input into non-overlapping windows of N patches, where HW N is the total number of the windows.For a local window feature X ∈ R, the query Q, key K, and value V matrices are computed as Q ¼ XP Q , K ¼ XP K , and V ¼ XP V , where Q, K, V ∈ R N×d .The P Q , P K , and P V are projection matrices shared across different windows.The self-attention of each head can be calculated as: AttentionðQ; K; VÞ ¼ SoftMaxð QK T ffiffi d p þ BÞV, where d denotes the query dimension; N stands for the number of patches in a window; and B ∈ R ð2N−1Þ×ð2Nþ1Þ .

Structural constraining and pathology awareness
The SCPA module is based on a transformer encoder-decoder architecture, which guides the virtual staining procedure.The SCPA module performs structural constraining and pathology awareness functions by segmenting the human coronary layers and classifying the types of coronary samples (normal or pathological).The multi-scale features are split into a sequence of patches x ¼ ½x 1 ; : : : ; x N ∈ R N×P 2 ×C , where ðP; PÞ stands for the patch size, N represents for the number of patches, and C is the number of channels of the multi-scale features.The patches are flattened and then linearly projected to an embedding sequence x 0 ¼ ½E x 1 ; : : : ; E x N ∈ R N×d , where d is the embedding dimension.Learnable position embeddings pos ¼ ½pos 1 ; : : : ; pos N ∈ R N×d are added to the sequence of patch embeddings to generate the tokens z 0 ¼ x 0 þ pos for Encoder.The Encoder maps the input sequence z 0 to z L ¼ ½z L 1 ; : : : ; z L N , which is an encoding sequence containing contextualized information of multi-scale features.
The SCPA module is designed to be aware of pathology patterns as well as maintain and constrain the normal structure of coronary samples.In the case of normal coronary samples, the z L is decoded to a segmentation map s ∈ R H×W×K , where K ¼ 3 and represents the three-layer structure of human coronary arteries.The segmentation map is acquired by the SCPA module, taking the scalar production between patch embeddings z M and class embeddings c: Segmentaion ¼ z M c T , where z M is acquired by decoding z L , and c is acquired by decoding a set of three randomly initialized learnable class embeddings [cls Intima , cls Media , cls Adventitia ] corresponding to the three coronary layers.In the case of diseased coronary samples, the patch embeddings z M are sent to a two-level MLP for classification between normal and pathological coronary images: Classification ¼ MLPðz M Þ.Also, the patch embeddings z M is concatenated to the extracted features from STB and then merged and up-sampled for OCT → Histology and Histology → OCT conversion.

Loss function
The loss function L of SCPAT-GAN consists of five terms, which are adversarial loss L adv , cycleconsistency loss L cycle , embedding loss L embedding , structural constraint loss L SC , pathology awareness loss L PA E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 4 ; 2 6 1 We follow the definition of L adv and L cycle made by Zhu et al. 7 and the definition of L embedding made by Liu et al. 8 α, β, γ, and ι are hyper-parameters.G O→H and G H→O are two generators that generate virtual histology images from OCT images and virtual OCT images from histology images respectively.G SC O→H , G SC H→O , G PA O→H , and G PA H→O are the SCPA modules for performing structural constraining and pathology awareness functions in the generators.The L SC is implemented by segmentation loss E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 4 ; 1 0 9 where S H and S O stand for the number of pixels in segmentation maps.y n;c H and y n;c O are the ground-truth pixel labels of different coronary layers for H&E and OCT images, respectively.C stands for the number of categories of the coronary layers (c ¼ 3).The L PA is implemented by classification loss E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 7 ; 6 8 8 where y p H and y p O are the ground-truth labels for pathology samples.We aim to solve the following minmax optimization problem E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 7 ; 6 2 1 3 Experiments 3.1 Experimental Settings

Experimental dataset
Human coronary samples were collected from the School of Medicine at the University of Alabama at Birmingham (UAB).Specimens were imaged via a commercial OCT system (Thorlabs Ganymede, Newton, New Jersey).A total of 194 OCT images were collected from 23 patients with an imaging depth of 2.56 mm. 9 The pixel size was 2 μm × 2 μm within a B-scan.
The width of the images ranged from 2 mm to 4 mm depending on the size of sample.After OCT imaging, samples were processed for H&E histology at UAB.We rescale the H&E images in the Aperio ImageScope software to enforce a pixel size of 2 μm × 2 μm.Among the dataset, 112 OCT images are from normal samples with the three-layer structure (i.e., intima, media, and adventitia); 82 OCT and H&E images contain pathological patterns.At the pixel level, we pixel-wisely label the structure (e.g., the layer structure) in a subset of OCT and H&E images for training purposes.At the image level, we label each OCT or H&E image as the pathological OCT or normal.The OCT and H&E images are divided into non-overlap patches with a size of 368 × 368.We randomly flip the patches from left to right for data augmentation.The training set contains 4297 OCT image patches and 4297 H&E image patches.

Implementation details
We adopt three convolution and transpose convolution layers with a stride of two for building a U-Net like structure for generating multi-scale feature maps.For the STB, we follow the design in our previous work. 6Our design of SCPA module is inspired by the Segmenter model. 10But different from the Segmenter, 10 we design the SCPA module to be capable of performing both segmentation and classification tasks, which suits our need for structural constraining and pathology awareness functions during virtual staining.The SCPAT-GAN is implemented by Pytorch.For training, the hyperparameters α, β, γ, and ι are set to 1, 0.2, 5, and 5 empirically.The pixel values of OCT and H&E images are scaled to [0, 1].The batch size is 9.The learning rate is initialized as 10 −4 , followed by a linearly decaying decay for every 2 epochs.In total, the SCPAT-GAN is trained 10,000 epochs to ensure convergence.The experiments are carried out on an RTX A6000 GPU.(5)

Metrics
where μðG O→H ðOÞÞ and μðHÞ are the magnitudes of the virtual stained and real histology images; Tr is the trace of the matrix; P G O→H ðOÞ and P H are the covariance matrix of the virtual stained and real histology images.The PHV is defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 4 ; 6 5 6 where N is the total number of extracted featuremap, F i represents the featuremap extracted from i'th layer of ResNet-101, avg is the average pooling operation that turns 3-D features into 1-D features, H is the unit step function, and T is a preset threshold.We use the three variations of PHV scores (i ¼ 1, PHV1), (i ¼ 2, PHV2), and (i ¼ 3, PHV3) which are extracted from different levels i of ResNet-101.We set T to be 0.02.Also, we designed a protocol to involve two pathologists (Dr.Silvio H. Litovsky, referred to as pathologist A; and Dr. Charles C. Marboe, referred to as pathologist B) to evaluate the quality of the virtual stained H&E images.Real and virtual stained H&E images are given to the pathologists, who are blinded to the true labels, to make predictions.The two pathologists work independently from each other.We compare the prediction results from the pathologist with the true labels, following the setup of the visual Turing test. 12,132 Results and Discussion

Quantitative analysis
The quantitative results (calculated by three-fold cross-validation) of SCPAT-GAN, as well as two start-of-art methods, for generating virtual stained H&E are shown in Table 1.Compared to our previous method (Coronary-GAN 4 ) and Cycle-GAN, the SCPAT-GAN generates virtual Table 1 The FID and PHV scores of SCPAT-GAN, Coronary-GAN, and Cycle-GAN.The PHV scores calculated from different levels of the feature maps PHV1, PHV2, and PHV3.We report evaluation results for normal, pathological, and the whole dataset.All the results are calculated using three-fold cross-validation.stained H&E images of better quality, with lower FID scores and higher PHV scores, for normal, pathological, and the whole dataset.Those scores indicate that virtual stained histology and real histology are perceptually similar.Moreover, we have two experienced pathologists with more than 30 years of experience to evaluate the quality of virtual stained H&E images.The pathologists, who are blind to the true labels, manually identify if an image is real or virtual.
The results of pathologists' evaluation are shown in Fig. 3.Among the total 60 images (half virtual and half real), over half of them (42 images by pathologist A and 33 images by pathologist B) are deemed as "real."For the virtual stained images, more than half (19 images) are deemed as "real" by pathologist A and half (15 images) are deemed as "real" by pathologist B. We calculated the accuracy (pathologist A: 0.56; pathologist B: 0.55), precision (pathologist A: 0.54; pathologist B: 0.54) values of the evaluation results from the two pathologists.We compare the evaluation results with that of random guessing (in theory, accuracy and precision should be 0.5 for an observer who is making choices randomly).We found that the average accuracy (0.55) and precision values (0.55) are close to that of random guessing.The average sensitivity (0.68) is higher, which indicates that the pathologists are capable of identifying real histology images.However, the average specificity (0.43) is lower, which means the virtually stained images are less likely to be identified.Thus, the quality of virtually stained images is close to that of real histology images according to pathologists' justification.Moreover, the intraclass correlation coefficient (ICC) between the evaluation results of the two pathologists is 0.014, which means a low interreader agreement because the images are indistinguishable.

Ablation study
We perform an ablation study by removing the structural constraining (PAT-GAN) or pathology awareness functions (SCT-GAN) or both (T-GAN).The models are retrained and compared with the design of SCPAT-GAN.The results are reported in Table 2.When both structural constraining and pathological awareness models are equipped, SCPAT-GAN reaches the best performance.The ablated model without the structure constraining and pathological awareness modules (T-GAN) reports a compromised performance.

Qualitative analysis
We visually inspect the virtual stained H&E images generated by SCPAT-GAN in Fig. 4. For normal coronary samples, the SCPAT-GAN is capable of generating the three-layer structure; for pathological coronary samples, the SCPAT-GAN is capable of resolving lipid-rich (red arrow) and calcified patterns (yellow star).Compared to real H&E images, virtual stained H&E images generated by SCPAT-GAN show similar patterns for lipid-rich and calcified regions.In contrast, the Coronary-GAN 4 and Cycle-GAN fail to generate pathological patterns.
The proposed SCPAT-GAN allows the generation of 3D virtual H&E volume for both normal and pathological human coronary samples.As shown in Fig. 5, we demonstrate 3D virtual H&E visualization for normal [Fig.5(a)] and pathological [Fig.5(d)] coronary samples.The 3D H&E visualization is impossible to acquire from conventional biochemical staining process, which provides an intuitive way of presenting histological information and reduces the randomness of the H&E sanctioning process. 14

Discussion
In this paper, we design a convolutional transformer-GAN, namely SCPAT-GAN, for generating virtual stained H&E histology from OCT images.Our SCPAT-GAN algorithm is capable of virtually staining OCT images for human coronary samples.The SCPAT-GAN does not require pixel-wisely matched OCT and H&E datasets.By incorporating structural constraining and pathology awareness functions in the SCPAT-GAN, our method outperforms existing methods, which is confirmed by both objective metrics and the pathologist's evaluation.Compared to other Table 2 The ablation study.We remove the pathological awareness module (SCT-GAN), structural constraining module (PAT-GAN), and both modules (T-GAN).The ablated models, SCT-GAN, PAT-GAN, and T-GAN, are retrained.label-free 15 or stain-to-stain 8 works for virtual staining of histology 16 which focuses on top-view images or other image modalities, our SCPAT-GAN is designed for cross-sectional, depthresolved OCT images and human coronary samples.Moreover, the proposed SCPAT-GAN is capable of generating 3D virtual stained H&E visualization for coronary samples, which is impossible to acquire using a conventional biochemical staining process.
As the first study to demonstrate the feasibility of virtual stained histology in OCT images from non-paired training, our study does not focus on computational optimization.In the future, we will further reduce the computational overhead of SCPAT-GAN via lightweight neural network 17 and implement parallel computing for 3D virtual histology.Also, we plan to enable the SCAPT-GAN in intravascular OCT imaging, towards the assistance of percutaneous coronary intervention.Furthermore, we will acquire more data and differentiate pathological patterns to provide fine-grained image-wise labels.Moreover, our current approach still requires image level labels of normal and pathological data and pixel-level layer annotation.We will explore selfsupervised approaches to address this issue.Besides, we will explore the other use-cases of the SCPAT-GAN, such as generating multiple types of virtual staining (e.g.Van Gieson staining, Toluidine blue staining, and Alcian blue staining), and virtual staining of other samples (e.g., human skin and eye).

Conclusion
In this paper, we develop a deep learning model, namely SCPAT-GAN, for generating virtual histology information.Our work is the first to generate virtual H&E images with pathological patterns for coronary samples based on OCT.The proposed framework has great potential to provide real-time histopathological information during an OCT imaging procedure.

Disclosures
The authors declare no conflicts of interest.

1
Network architectureThe design of SCPAT-GAN is shown in Fig.2.The SCPAT-GAN consists of two convolutional transformer generators (G O→H and G H→O ) and two discriminators (D H and D O ).The transformer structure possesses self-focus mechanisms that provide the global context of a given data sample even at the lowest layer.G O→H transfers images from OCT domain to the histology domain; G H→O transfers images from the histology domain to the OCT domain.The two generators share a similar structure.D H is the discriminator for histology images and D O is the discriminator for OCT images.Symbols O and H stand for OCT and histology images respectively.
Fig. 2 (a) The design of the SCPAT-GAN.(b) The scheme of G O→H and D H .The G O→H performs virtual staining based on OCT images.The D H distinguish the virtual histology images from real histology images.The SCPA module guide the virtual staining process by performing structural constraining and pathology awareness functions.(c) Details of the convolutional transformer generator (CTG).The multi-scale features are fed to STB and SCPA modules for virtual staining.

E 6 FIDG
Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 4 ; 7 3 ¼ jμðG O→H ðOÞÞ − μðHÞj 2 − Tr X

Fig. 3
Fig. 3 Result of pathologist's evaluation of real and virtual stained H&E images.The number of images (N) in each quadrant is attached.(a) Evaluation results from pathologist A. (b) Evaluation results from pathologist B.

Fig. 4
Fig. 4 Visual inspection of virtual stained H&E images generated by SCPAT-GAN, Coronary-GAN, 4 and Cycle-GAN.(a) The normal coronary sample.The virtual stained H&E image is very similar to the real H&E image with the three-layer structure resolved.In virtual stained H&E images, the lipid-rich regions appear as white holes.(b) Coronary samples with lipid-rich regions.(c) Coronary samples with calcified regions.The triangle and star represent different texture contrast.The calcified region has a color of dark purple.*The contrasts of OCT images in panels (a)-(c) are enhanced to highlight the texture information for better visualization.The scale bar: 500 μm.